Efficient XQuery Support for Stand-Off Annotation
نویسندگان
چکیده
XML annotations are a widely occurring phenomenon in many application fields, and XML databases should be used to store and query such data. To provide intuitive and fast querying of annotations, we make a case for extending XPath with four new axis steps, that correspond with socalled StandOff joins, introduced here. The new steps can be efficiently implemented using a region index and fast looplifted StandOff MergeJoin algorithms. These techniques were added to the open-source XML DBMS MonetDB/XQuery, and we show in our evaluation it thus becomes capable of interactively querying >GB annotation databases.
منابع مشابه
Multi-dimensional Annotation and Alignment in an English-German Translation Corpus
This paper presents the compilation of the CroCo Corpus, an English-German translation corpus. Corpus design, annotation and alignment are described in detail. In order to guarantee the searchability and exchangeability of the corpus, XML stand-off mark-up is used as representation format for the multi-layer annotation. On this basis it is shown how the corpus can be queried using XQuery. Furth...
متن کاملSchema Validation and Type Annotation for Encoded Trees
We argue that efficient support for schema validation and type annotation in XQuery processors deserves as much attention as efficient evaluation techniques for XPath queries have received in the past. To this end, we describe a validation procedure that operates on an encoding of trees that has already been succesfully used for XPath location step evaluation. The validation algorithm works wit...
متن کاملEfficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently ...
متن کاملPractical applications of stand-off annotation
An information system that makes use of stand-off annotation stores metadata separately from the data they describe. System architectures separate metadata from data in order to cope with heterogeneous annotations or with multimedia formats. This paper discusses some of the practical aspects of implementing an information system with a stand-off architecture. Two systems that use stand-off anno...
متن کاملIntegration of Data in Pathogenomics: Three Layers of cellular complexity and an XML-based Framework
For efficient data integration of all data the XML based platform myBSMLStudio2003 is discussed and developed here. It integrates XQuery capabilities, automatic scripting updates for sequence annotation and a JESS expert system shell for functional annotation. In the context of genome annotation platforms in place (GenDB, PEDANT) these different tools and approaches presented here allow improve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006